Tailoring Fuzzy C-Means Clustering Algorithm for Big Data Using Random Sampling and Particle Swarm Optimization
نویسنده
چکیده
As one of the most common data mining techniques, clustering has been widely applied in many fields, among which fuzzy clustering can reflect the real world in a more objective perspective. As one of the most popular fuzzy clustering algorithms, Fuzzy C-Means (FCM) clustering combines the fuzzy theory and K-Means clustering algorithm. However, there are some issues with FCM clustering. For example, FCM is very sensitive to the initialization condition, such as the determination of initial clusters; the speed of convergence is limited, and the global optimal solution is hard to be guaranteed. Especially for big data scenario, the overall speed is slow, and it is hard to perform the clustering algorithm on all the original dataset. To solve above challenges, in this paper, we propose a modified FCM based on Particle Swarm Optimization (PSO). Besides, we also present a multi-round random sampling method to deal with the big data problem, by simulating the clustering on the original big dataset with the objective to approximate the clustering results on sample datasets. Our experiments show that both the modified FCM using PSO and the multi-round sampling strategy are efficient and effective.
منابع مشابه
OPTIMIZATION OF FUZZY CLUSTERING CRITERIA BY A HYBRID PSO AND FUZZY C-MEANS CLUSTERING ALGORITHM
This paper presents an efficient hybrid method, namely fuzzy particleswarm optimization (FPSO) and fuzzy c-means (FCM) algorithms, to solve the fuzzyclustering problem, especially for large sizes. When the problem becomes large, theFCM algorithm may result in uneven distribution of data, making it difficult to findan optimal solution in reasonable amount of time. The PSO algorithm does find ago...
متن کاملFuzzy Particle Swarm Optimization Algorithm for a Supplier Clustering Problem
This paper presents a fuzzy decision-making approach to deal with a clustering supplier problem in a supply chain system. During recent years, determining suitable suppliers in the supply chain has become a key strategic consideration. However, the nature of these decisions is usually complex and unstructured. In general, many quantitative and qualitative factors, such as quality, price, and fl...
متن کاملThe Research of Building Fuzzy C-Means Clustering Model based on Particle Swarm Optimization
Particle Swarm Optimization algorithm is based on iterative optimization tools, system initialization of a group of random solutions, through iterative search for the optimal value. The basic idea of the fuzzy C-means clustering algorithm is to determine each sample data belonging to a certain degree of clustering, and the degree of membership of sample data is grouped into a cluster. Favor opt...
متن کاملOptimization and design of Adaptive Neuro-Fuzzy Inference System using Particle Swarm Optimization and Fuzzy C-Means Clustering to predict the scour after bucket spillway
Additionally, if the materials at downstream of bucket spillway are erodible, the ogee spillway is likely to overturn by the time. Therefore, the prediction of the scour after bucket spillway is pretty important. In this study, the scour depths at downstream of bucket spillway are modeled using a new meta-heuristic model. This model is developed by combination of the Adaptive Neuro-Fuzzy Infere...
متن کاملFuzzy clustering of time series data: A particle swarm optimization approach
With rapid development in information gathering technologies and access to large amounts of data, we always require methods for data analyzing and extracting useful information from large raw dataset and data mining is an important method for solving this problem. Clustering analysis as the most commonly used function of data mining, has attracted many researchers in computer science. Because o...
متن کامل